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Abstract. The Covariance Matrix Adaptation Evolution Strategy (CMA- 
ES) has been the most successful Evolution Strategy at exploiting covari- 
ance information; it uses a form of Principle Component Analysis which, 
under certain conditions, is suggested to converge to the correct covari- 
ance matrix, formulated as the inverse of the mathematically well-defined 
Hessian matrix. However, in practice, there exist conditions where CMA- 
ES converges to the global optimum (accomplishing its primary goal) 
while it does not learn the true covariance matrix (missing an auxiliary 
objective), likely due to step-size deficiency. These circumstances can 
involve high- dimensional landscapes (n > 30) with large condition num- 
bers (^ > 10*). This paper introduces a novel technique entitled Forced 
Optimal Covariance Adaptive Learning (FOCAL), with the explicit goal 
of determining the Hessian at the global basin of attraction. It begins 
by introducing theoretical foundations to the inverse relationship be- 
tween the learned covariance and the Hessian matrices. FOCAL is then 
introduced and demonstrated to retrieve the Hessian matrix with high 
fidelity on both model landscapes and experimental Quantum Control 
systems, which are observed to possess a non-separable, non-quadratic 
search landscape. The recovered Hessian forms are corroborated by phys- 
ical knowledge of the systems. This study constitutes an example for nat- 
ural computing successfully serving other branches of natural sciences, 
and introducing at the same time a powerful generic method for any 
high-dimensional continuous search seeking landscape information. 

Keywords: Evolution Strategies, Covariance-Hessian relation, FOCAL, Hessian 
learning, experimental optimization, Quantum Control experiments, robustness. 

1 Introduction 

Automated learning of the Hessian matrix about the global optimum in the pres- 
ence of noise and without any derivative measurement constitutes a challenging 
task. Knowledge of second-order information at the optimum is very desirable 



as it may offer a measure of system robustness, e.g., to noise, allow for dimen- 
sionality reduction, and assist in landscape characterization. Within the Com- 
putational Intelligence community, Hessian learning is occasionally associated 
with the operation of Evolution Strategies (ES) Pj, powerful bio- inspired search 
heuristics, which excel in high-dimensional search of continuous landscapes. The 
advent of modern ES, also known as Derandomized Evolution Strategies (DES) 
[2], facilitates efhcient global optimization with small population sizes, subject to 
real-time statistical learning of the successful consecutive steps. Following a series 
of successful DES variants, formulated throughout the early 1990's |3|4|5| , the 
Covariance Matrix Adaptation Evolution Strategy (CMA-ES) emerged |6I7| as a 
state-of-the-art Evolution Strategy. It has become a successful global optimizer, 
outperforming other stochastic search heuristics in various reported competitions 
(see, e.g., [819] ). being increasingly employed in a long list of real-world appli- 
cations [TU], and having extensions to multi-objective Pareto optimization [TT], 
uncertainty handling [T^, niching I13ll4j . to mention a few. At the same time, 
this popular algorithm was also simplified in a competing strategy (the CMSA; 
see |15j). and was even further improved for certain cases of global optimization 
[T5] . The CMA-ES targets an efficient on-the-fly learning of the optimal mutation 
distribution by applying Principal Component Analysis (PCA) to the successful 
search-variations within a population of candidate solutions. Toward this end, it 
constructs a covariance matrix that uniquely dictates the operation of the mu- 
tation mechanism (see Fig. [l] for an illustration). Most ES, and particularly the 
CMA-ES, have long aimed to learn a covariance matrix associated with the Hes- 
sian matrix. Although this behavior has never been rigorously addressed, it has 
been implicitly expressed in conventional ES literature (see, e.g., |17|). A crucial 
question is whether the constructed covariance matrix is always related to the 
mathematically well-defined Hessian matrix, or rather simply reflects the point 
dispersion of a stochastic search path guided by selection pressure. Another issue 
is whether there are scenarios where the CMA-ES accomplishes convergence to 
the global optimum, while it does not learn the true covariance matrix during 
this process. The goal of the current work is thus three-fold: 

1. Establish rigorous principles for the relationship between the inverse Hessian 
matrix and the covariance matrix that is learned by an Evolution Strategy. 

2. Illustrate and explain scenarios where the CMA-ES does not learn a covari- 
ance matrix reflective of the inverse Hessian matrix at the global optimum. 

3. Present a new method, relying on existing DES mechanisms but rather pos- 
sessing a shifted focus toward landscape learning, with the explicit goal of 
Hessian determination at the global basin of attraction. This novel technique, 
entitled Forced Optimal Covariance Adaptive Learning (FOCAL), primarily 
targets practical experimental optimization scenarios. 

The specific application that motivated the development of FOCAL is the 
Quantum Control (QC) field, which aims at coherently altering quantum dy- 
namics phenomena typically by means of optimally shaped ultrafast laser fields 
[18119120) . Following the formulation of Quantum Control Theory (QCT) [H], 
its laboratory realization was achieved by means of closed-loop experiments [22] 



leading to a growing list of applications in a broad range of research topics (see 
PU] and references therein). In recent years the interest in the QC field grew also 
within the Computational Intelligence community, likely due to the popularity 
of Evolutionary Algorithms as the routinely employed optimization heuristics in 
closed QC learning loops (see, e.g., |23I24I2512B] ). in parallel to the growing un- 
derstanding of QC search landscapes }27|28| . Furthermore, in the context of the 
current study, various DES routines also started to be deployed in the optimiza- 
tion of QC systems |29l30l31j . along with the successful application of CMA-ES 
in experimental QC single-objective |32I33I34] and multi-objective [33] systems. 

The remainder of the manuscript is organized as follows. Section [2] will in- 
troduce the fundamental concepts behind the process of covariance learning and 
Hessian determination. It will review the ES machinery, especially in light of 
the mutation operator and covariance matrix learning, and outline the theoret- 
ical principles for the inverse relation between the covariance and the Hessian 
matrices. Section [3] will discuss the limitations of Evolution Strategies, and in 
particular of the CMA-ES, to accomplish successful learning of the inverse Hes- 
sian matrix, and will illustrate this behavior under certain conditions. We shall 
then present our proposed scheme, the FOCAL algorithm, and discuss its mech- 
anism in detail. A preliminary proof of concept will be then demonstrated on a 
noisy model landscape. This will be followed by the description of the experi- 
mental systems under study in Section[4]and the practical observation in Section 
O We will discuss our results and summarize this work in Section [6l 
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Fig. 1. CMA-ES optimization on a 2-dimensional model landscape. The various el- 
lipsoids define equiprobability sampling contours at different locations on the search 
landscape. Principal Component Analysis of directions of the highest yield improve- 
ments allows for continual refinement and coordinate rotation of these distributions in 
order to generate an optimal sampling distribution. 




2 From Covariance Learning to Hessian Determination 



This section will provide some background on the ES machinery, and especially 
on the mutation operator and its statistically learned covariance matrix. We will 
then examine the foundation of the inverse relationship between the covariance 
matrix and the Hessian matrix at the top of the landscape. 

2.1 Covariance Utilization in Evolution Strategies 

ES uses stochastic continuous variations for the mutation operation which are 
based upon the multivariate normal distribution. The latter is well defined by a 
covariance matrix. Furthermore, in non-elitist DES strategies, every generation 
concludes with a single search point, referred to as the parent, either by means 
of a single selection in a (1, A)-strategy, or by means of recombination in a (/i, A)- 
strategy. Let the search seek the global maximum in a landscape of dimension n. 
Given the parent in generation g, x^^\ DES generate offspring xf~^^\ fc = 1 . . . A, 
according to the following characteristic equation: 

4^'+i)=a;(f)+CT(9'-A4(o,C(f') fc = l...A, (1) 

where cr(9) is the global step-size, and C*-^-* is a covariance matrix defining the mu- 
tation distribution. There are three canonical correlation mutation schemes, cor- 
responding to three distinct forms of the covariance matrix (for a broad overview 
we refer the reader to [Ml : note that in our notation, unlike the Standard ES, the 
global step-size and the covariance matrix are considered as separate entities): 

(A) A constant identity matrix, i.e., adaptation exclusively relies on a step-size 
mechanism to update the global step-size: 

Cq = I = const (2) 

(B) A diagonalized covariance matrix, i.e., a vector of n independent strategy 
parameters, presumably comprising the variances of the decision variables, 
typically referred to as the individual step-sizes: 

Cfa = diag (Ai, A2, . . . , A„) = diag (VAR(a:i), VAR(a;2), . . . , VAR(2;„)) (3) 

(C) A general non-singular covariance matrix with arbitrary (n • (n + 1)) /2 in- 
dependent strategy parameters: 

C, = (c„-) = RAR^, (4) 

introducing an eigendecomposition by means of an orthonormal coordinate 
rotation matrix R and the eigenvalue matrix A. 

A geometrical interpretation of the three schemes is given in Fig.[2j In the context 
of this study, only case (C) may offer an inversion to the fully-formed Hessian 



matrix, whereas case (B) may possibly constitute a compressed inverse form of 
the Hessian eigenvalues. Upon realization of these three schemes, this study will 
consider the defauh CMA-ES ,7 for case (C) (denoted by def-CMA-ES), the 
sep-CMA-ES [37^ for case (B), and the CMA with a constant unit matrix for 
case (A); The latter is a zero-order DES which generates isotropic mutations 
and employs the Cumulative Step-Size Adaptation (CSA) mechanism ,38J as its 
adaptation mechanism, and will be referred here as the iso-CMA-ES. Moreover, 
we shall consider the (/J,vi/,A) strategy, i.e., a non-elitist strategy generating A 
search-points, amongst which it recombines the /i with the highest ranking in a 
weighted averaging. 
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Fig. 2. Equidensity mutation sampling ellipsoids corresponding to three covariance 
matrix forms in ES, depicted on a 2-dimensional model landscape. Panel (a) displays 
a constant identity matrix. Panel (b) uses 0{n) independent parameters to generate 
on-axis ellipsoids. Panel (c) utilizes ©(n^) parameters to create an arbitrary covariance 
matrix. 



2.2 Covariance versus Hessian: Principles of Inversion 

Since the development of Evolution Strategies, it has been implicitly assumed 
that the evolving covariance matrix approximates the inverse Hessian of the 
search landscape. This was primarily supported by the rationale that locating 
the global basin of attraction by an ES can be accommodated using mutation 
steps that fit the actual landscape, or, equivalently, that the optimal covari- 
ance distribution can offer mutation steps whose equidensity probability contours 
match the level sets of the landscape |39j . Another rationale behind this was the 
following: the reduction of a general problem to a spherical problem may be 
achieved by sampling search-points based upon a covariance matrix that is the 
inverse Hessian (note that this is equivalent to replacing the Euclidean metric 
by the Mahalanobis metric [H]). Since ES optimally operate on the quadratic 
sphere model, any successful ES run suggests that the covariance learning was 
accomplished. However, at the same time, there has been no rigorous analysis 



that explicitly demonstrates that ES machinery indeed learns the inverse of the 
Hessian. A study by Rudolph 39J proved the potential of ES to facilitate such 
learning and derived practical bounds on the population size toward the end 
of successful learning. The latter work also questioned the effectiveness of ES 
learning upon discarding past information and claimed that ES learning would 
benefit from introducing memory to the individuals. In retrospect, this has in- 
deed been achieved throughout the development of DES, which provides the 
basis to introduce FOCAL in the current study. 

We now analyze the inverse relation between the covariance and the Hessian 
matrices at the top of the landscape. This section will thus focus on the covari- 
ance matrix upon reaching the proximity of the global attraction basin, when 
subject to a self-adaptive Evolution Strategy. 

A covariance matrix amongst sets of variates is defined as 

Cij — (^XiXj) (-^i) (•^j) • 

By construction, the origin is set at the parent search-point, which is located 
at the optimum. The covariance elements are then reduced to the following 

expectation values: 

/ + 00 
XiXjV [x) dx, (5) 
-oo 

where V (x) is the probability distribution function (PDF), and the variables Xj 
correspond to point dispersion about the optimum. Revealing the nature of the 
PDF is necessary for the interpretation of the covariance matrix. Here, the sam- 
pling of the variate will be dictated by the selection mechanism of the Evolution 
Strategy, and upon considering a deterministic non-elitist (/j,, A)-strategy, the 
PDF depends only upon the objective function value. Since we assume that nei- 
ther penalties nor cost functions are employed, the PDF becomes the following 
functional of the yield: 

Vix)^r[J{x)]. (6) 

J (x) refers here to the distance in the objective space of the trial solution, x, to 
the global optimum location x* , 

J{x) = U.^- f{x) = fix*)- fix) 

i.e., it does not refer to the absolute yield value, but rather to possible yield 
deviations from the optimum. 

Importantly, the selection mechanism is blind to the location of the candidate 
solutions in the search space, and its solitary criterion is the ranked yield values. 
Consequently, the PDF is a function of the objective value alone, 

V = ViJ). 

The crucial claim is that V (J) follows the exponential distribution: 



V (J) ~ 7exp (—7 J) . 



(7) 



It is important to note that while the hne of reasoning leading to this distri- 
bution is general, the exponential distribution itself is strictly valid only in the 
immediate vicinity of the optimum. Elsewhere, the mutation-selection proba- 
bility distribution must be weighted by the probability of generating a given 
mutation; this supplemental probability distribution is monotonically decreas- 
ing with the yield J only at the optimum, which prevents it from significantly 
distorting the exponential selection distribution. In what follows, we show the 
correctness of the claim from two different perspectives. 



The Memoryless Property Given the deterministic ranking of the A can- 
didate solutions per generation, {xk-.x}^^^, based upon their ascending ranked 
distance to the global optimum {Jk:\}k=i^ we examine the selection probability 
{Ji:\) corresponding to the trial solution Xi;\. Upon determination of the best 
individual in the current generation, Xi;x, the selection probability of the i*^ 
individual depends only upon its relative distance in the objective space to Ji;x, 
denoted as AJi i = Ji-x — Ji.x- 

V{J^..x\Ji:x)=V{AJ,^i). 

Consider Bayes' theorem, 

V {Ji..X \Jl:x) = V {Jv.X \Ji:x) ' Jt^, 

and note that P {Jux \ Ji:X ) = Ij which allows us to conclude 



V{Ji..x) = V{AJi,i)-V{Ji..x). (8) 

Note that this functional equation is only satisfied with the exponential dis- 
tribution. In other words, the selection probability for a given individual has 
no memory of its absolute distance to the global optimum, but rather depends 
upon its relative distance to other candidate solutions of the same generation, 
or equivalently, we may state that the variable Ji;x is memoryless with respect 
to the global optimum J* = 0. Since the exponential distribution is the 
only continuous memoryless random distribution, V (J) must necessarily 
follow it. 



The Defining Equation Upon increasing the value of J, the corresponding 
rank of the trial solution decreases. This directly translates into the requirement 
that the probability distribution function should be monotonically decreasing 

when increasing J values, i.e., ^ < 0. A simple way to satisfy this requirement 
is by considering its derivative by means of a positive weighting function w {J): 



(9) 



with the foUowing boundary conditions, 



w(J = 0) = l 

(10) 

w ( J oo) 0, 

and where the scalar accounts for the appropriate scaUng. Note that a Gaussian 
profile is already ruled out at this point, since its first derivative vanishes at the 
origin. In addition, the weighting function w{J), which may be interpreted as 
the mutation sensitivity, is monotonically decreasing (fj < 0) according to the 
following qualitative rationale: larger yield decrease is more likely to occur upon 
perturbing higher ranked solutions, i.e. low J values, when compared to pertur- 
bation of lower ranked solutions. Following the same arguments upon which we 
constructed w (J), we may generate an equivalent weighting function with the 
appropriate properties: 

- = -w(J). 

A simple way to satisfy these demands is by setting: 

w{J)^w (J) =^ w (J) - exp (-ry J) . (11) 

These recursive relations lead us to conclude that the parent function, V (J) is 
the exponential function, which is known to be the only mathematical function 
constituting its own derivative: 

V (J) - exp . (12) 

Obtaining the Hessian 

Given the conclusions of the previous section, we may rewrite Eq. [5]as follows: 

/+00 
XiXj exp [—J J {x)] dx. (13) 
-oo 

We focus here on the attractive basin of the global optimum, where the PDF 
allows continuous sampling while maintaining the yield values at the top of the 
landscape. Hence, J {x) may be Taylor- expanded about the optimum, 

J{x) c:i^x'^ -n-x, (14) 

which in combination with the normalized form of the exponential distribution 
yield: 

V[J{x)]^[^y^'\n\'^'exp(^-^jx'^-n-x^ (15) 

where |?^| is the Hessian's determinant. Upon substitution to Eq. [s] the covari- 
ance element reads 



r — 



x^XJ eiip (^-^-fx^ ■ H ■ x^ dx (16) 



Finally, the integration can now be completed to obtain the desired covariance 
element: 



3 FOCAL: Forced Optimal Covariance Adaptive Learning 

The statistical machinery of the CMA-ES has the necessary tools to obtain the 
covariance matrix of the decision variables in the presence of noise without any 
derivative estimation, and upon inversion, to attain the Hessian matrix at the 
global optimum. If the learning of the covariance matrix is successful upon con- 
vergence to the optimum, the evolution strategy may continue to generate muta- 
tions without affecting the target control yield (fitness) , since the extended sam- 
pling results in a large dispersion of points for insensitive search directions (i.e., 
small magnitude Hessian eigenvalues). At the same time, mutations that obtain 
significant yield deterioration are rejected, leading to a small point dispersion 
for highly sensitive search directions (i.e., large magnitude Hessian eigenvalues). 

The default CMA-ES does not necessarily operate in the manner above, and 
its covariance matrix is not always able to evolve to a form that properly reflects 
the structure of the underlying search landscape. The rationale for this deviation 
from its prescribed behavior is two-fold. First, at the practical level, the default 
covariance matrix learning rate is proportional to the reciprocal of the squared 
search-space dimensionality and therefore precludes meaningful learning in a 
timely manner for large search space dimensions, e.g., n > 30. More importantly, 
at a deeper conceptual level, since the covariance matrix adaptation is carried 
out simultaneously with the Cumulative Step-Size Adaptation (CSA) scheme 
[38j , upon approaching a basin of attraction the global step-size shrinks to zero 
and the exploration of the landscape is practically halted. There exist certain 
situations (e.g., landscapes with large condition numbers, ^ > 10^), in which 
this CSA behavior constitutes an obstacle to the CMA-ES learning the correct 
covariance matrix. As demonstrated later in Figures [3] and [Sj the default CMA- 
ES does not perform well in identifying a high-dimensional Hessian (n = 80) 
and does not learn the correct eigenspectrum. These observations correspond to 
simulations and experiments that were conducted within realistic environments 
of noisy decision (input) variables and a practical budget of function evaluations 
(~ 10'') and will be described in greater detail in what follows. 

The main idea behind the newly proposed FOCAL technique is to force land- 
scape exploration for the sake of statistical analysis, by preventing this stagna- 
tion via the global step-size update scheme. FOCAL is thus introduced as a 
modification to the CMA-ES for the sake of landscape learning, which can also 
be utilized in other DES procedures. It is important to note, as was previously 
suggested, that FOCAL's success is contingent on the search-engine's capabil- 
ity to reach the proximity of the global optimum. Failure to meet this criterion 
would naturally render FOCAL-based Hessian learning invalid. 

In this section we shall consider the CMA-ES as the search-kernel of the 
FOCAL method, representing case (C) in the mutation schemes outlined in Sec- 




(17) 



tion |2.1[ Since FOCAL is a generic method, alternative search-kernels may be 
employed. In this work we shall consider two additional kernels. The reduction 
to case (B), when a compressed landscape picture is sought in the form of a 
diagonalized matrix, is straightforward. Furthermore, the consideration of case 
(A) (zero-order DES with isotropic mutations) within the FOCAL context will 
constitute a reference routine, i.e., demonstrating the necessity for on-axis or 
arbitrarily rotated mutations. The success or failure to fulfill FOCAL's goals on 
a certain problem does not necessarily reflect the performance of the employed 
search-engine for the canonical global optimization of the same problem. It is 
possible that an equivalent search with the same kernel but with an alternative 
step-size mechanism would obtain a different outcome, e.g., locating the global 
optimum. This differentiation stems from the primary objective of FOCAL, that 
is, precise learning of the optimal sampling distribution, rather than yield max- 
imization as in canonical global optimization. 

3.1 FOCAL: Mechanism Description 

FOCAL introduces two main modifications to the default CMA-ES algorithm: 

— Adjusting the value of the covariance matrix learning rate, Ccov 

— Reformulating the global step-size update scheme 

In what follows, we will elaborate on each of those two components. 

The Learning Rate Ccov The default learning rate of the covariance matrix 
within the CMA-ES is proportional to the reciprocal of the squared search-space 
dimensionality, i.e., Ccov I/tt? [7j. It therefore precludes meaningful learning, 
especially when the search space dimensionality is large, e.g., n > 30. With this 
in mind, the learning rate is adjusted in FOCAL to allow contribution of at 
least several percent (order of 1% ~ 10%, i.e., Ccov — 0.01 ^ 0.10) to the co- 
variance update from the statistical analysis of successful mutations. Also, when 
considering the auxiliary factor H„ in certain implementations [7] it should be 
permanently fixed as H„ = 1. Setting Ccov becomes a user task, which should 
take into account the trade-off between convergence speed and convergence reli- 
ability; lower covariance contributions should allow for more robust convergence 
at the expense of additional function evaluations, and vice versa. A priori knowl- 
edge about the problem, such as the estimated rank of the target Hessian, may 
assist in this task. See the discussion about parameter settings in Section [5. 5| 

The Practical Step Upon updating the covariance matrix, its eigendecompo- 
sition yields a diagonal matrix A'^' , which comprises the variances of the rotated 
control variables: 

A(^)=diag(Al^\A(«\...,A(f)), (18) 
with the eigenvalues |Aj®^| sorted, i.e., A^^^ = AmL, = 



Let us consider the following difference vector: 



(^)vF = X!^* ' z.^Af (0,1 



1/2 

• {z)w 

(19) 



where cr(f) is the global step-size, and R^^^ is the orthonormal coordinate rotation 
matrix obtained by the routinely employed eigendecomposition of the covariance 
matrix. This vector constitutes the translation between the parent point to its 
offspring, and deriving its expected behavior will assist in explaining the rationale 
behind the FOCAL mechanism. We therefore choose to examine the Root-Mean- 
Square (RMS) statistics of this difference- vector, 



/W 



= ^ . y A(/) = — . tr f A(s)U — . tr f C(s) 

(20) 

where ^cS = 1/ J2i=i Hi recombined cross-terms vanistj^ Also, note 

that the trace is preserved through the eigendecomposition. 

Since ||Z\a;*^3^|| ~ ||z\a;(f^ we then define the practical step-size as the 
RMS: 



Si3^= Ax^9) = J(\\Axm''\ = . Jtr(C(9)) (21) 

^ RMS V \ / ^/m^ * 

This scalar thus reflects the interplay between the global step-size and the effec- 
tive sampling coverage, which is represented by the eigenvalue spectrum of the 
covariance matrix. In what follows we shall consider = 1. 

Bounding the practical step-size is of particular interest here. Subject to any 
step-size mechanism, the upper and lower bounds of the practical step-size may 
be inferred from Eq. 21 upon consideration of extreme scenarios of isotropic 



Since the sampled random variables are mutually independent, we have 



fc=i 



covariance matrices with radii of either Amin or A 



^^'^ ■ \/"-ALt < < ^^''> ■ ^Jn-X^L (22) 

At the top of the landscape, stable convergence is possible with a well adapted 
global step-size, which is likely to shrink to zero by means of the step-size mecha- 
nism - e.g., the CSA mechanism - regardless of the learned sampling distribution. 
The crucial point is that the practical step-size will vanish even with a poor 
sampling distribution around the optimum, as it is primarily governed by the 
global step-size: 

^(9) y =^ > 0, (23) 

where J denotes the distance to the global optimum in the objective space, 
as before. We stress that this does not constitute any fault in the algorithmic 
behavior; on the contrary, this is the desired convergence profile in canonical 
global optimization. At the same time, it precludes proper sampling about the 
optimum, and therefore no meaningful statistical learning may be carried out. 



The Characteristic Update In order to generate "sampling pressure" that 
prevents stagnation and forces continuous landscape exploration about the op- 
timum, the CSA mechanism has to be replaced. Upon attempting to set a lower 
bound s on the global step-size a'-s\ a rapid decrease of the covariance eigen- 
values is observed, eliminating again proper sampling about the optimum (Eq. 



e 



tr 



C(9) 







0. 



(24) 



J-s-O V / ,7^0 ^ .1^0 

In FOCAL, the CSA is replaced with a step-size update that always forces a finite 
practical step-size Jp^'' , regardless of the landscape location or mutation success. 
FOCAL imposes the strongest spectrum-dependent pressure on the practical 
step-size, while counteracting the decrease of the covariance trace, tr(C^^^), 
according to the following update: 



(25) 



1), the practical step now 



Si9) 
P 



(26) 



Upon substituting Eq. 25 into Eq. 21 (with /icff 
becomes 

y^tr(C(f)) 

(•^min) 

The scalar ctq is a constant user-specified forced step through search space, and 
the FOCAL learning power satisfies a < 1/2. According to this proposed ratio- 
nale, new offspring are always required to make a step in all axes, and the col- 
lected statistics allow refinement of the optimal sampling distribution required to 



maintain high fitness values. In order to stay at the top of the landscape, FOCAL 
forces discovery of an optimal sampling distribution that generates mutations 
orthogonal to yield-sensitive search directions. 

The operational mechanism of FOCAL and the role of the parameter a may 
be further understood upon consideration of the derived bounds for the practical 
step-size, iJp^-* , when the global step-size is updated subject to the FOCAL scheme 



(substituting Eq. 25 into Eq. 22): 



ao^n 



(27) 



The lower and upper bounds of the practical step-size Jp^"* are dependent upon 
the covariance spectrum when a < 1/2. The lower bound corresponds to mu- 
tations taken entirely parallel to the most yield-sensitive search direction (the 
smallest covariance eigenvalue, or alternatively the largest Hessian eigenvalue), 
whereas the upper bound corresponds to mutations taken parallel to the least 
yield-sensitive search direction (largest covariance eigenvalue, smallest Hessian 
eigenvalue) . 

In order to minimize the practical step in critical search directions and there- 
fore prevent significant yield deviations, the proper covariance matrix must be 
learned. As the estimation of the covariance matrix improves (i.e., Amin — ^ 0), 
the occurrence of significant yield deviations diminishes as mutations predomi- 
nantly occur in the search null-space directions. The spectral dependency, with 
a < 1/2, generates pressure to learn the optimal covariance, since its removal 
will make the practical step-size identical in all directions. Furthermore, the 
condition number, 



cond 



A 



, (28) 

mill 



rapidly grows as the estimation of the covariance matrix improves, which may 
generate significant steps outside of the quadratic regime. Thus, the spectrum 
dependency also plays the role of taming this rapid expansion of the practical 
step-size upper bound. The optimal value of a is likely problem-dependent and 
can affect the FOCAL learning rate. 



Inversion and Regularization Upon termination of the forced exploration 
about the global optimum, the Hessian may then be recovered by inversion of 
the FOCAL-optimized covariance matrix, i.e., 

where the eigenvectors R*^-^^ are shared by both the covariance and Hessian, and 
as previously mentioned, the Hessian spectrum is inversely proportional to the 
covariance spectrum. However, because the optimal covariance matrix is rank 



deficient, as evidenced by the ability to place mutations in the search null-space 
directions, this ill-conditioned matrix has first to be regularized prior to its in- 
version. It is worth noting that measurement uncertainty due to experimental 
noise may act as an implicit sclf-regularization procedure, and thus the covari- 
ance matrix only asymptotically approaches a singular form. In our proposed 
method, the Tikhonov filtering method is utilized for explicit regularization of 
the covariance spectrum ,40.41] to yield the target Hessian spectrum, 

h. = {XT'y' = Jt-, e^lO-'. (29) 



Climbing the Landscape: Step-Size Alternatives As noted earlier, FO- 
CAL critically relies on its driving search-engine to climb to the top of the 
landscape, where it can start launching its exploration - i.e., it is essential for 
the underlying search-engine to succeed in order for FOCAL to function. As will 
become evident in Section [5] the proposed method is extremely efficient in QC 
experiments, which are the focus of this study. It is possible that other search 
landscapes may not allow for an efficient climbing phase with the FOCAL rou- 
tine, but would necessitate alternative step-size mechanisms, such as the CSA or 
others JW . In those cases, it is recommended to employ a DES search-engine for 
the climbing phase, and switch-on FOCAL once the search reaches the proxim- 
ity of the global optimum, based upon a step-size criterion (e.g., once the global 
step-size reaches low values in the order of cr*-^^ sa 10^^). The FOCAL method 
is summarized as Algorithm [l] 

3.2 Preliminary Proof of Concept: A Noisy Model Landscape 

As a preliminary demonstration of FOCAL operation, we consider a model land- 
scape with noisy decision (input) variables, known as the separable ellipse: 

n 

/e (a;) = • {x, +K {0,el)y min (30) 

1=1 

where ^ is the condition number, set here to a high value of 10^. The fact that 
this model landscape is a separable test-case does not constitute a limitation on 
our proof of concept, as locating the global optimum is a necessary condition 
for FOCAL operation. By generating a challenging non-separable instantiation 
of this problem, e.g., upon increasing the condition number and applying both 
translation and rotation (comparable to [12]), the success-rate of the employed 
DES kernel is simply reduced, and FOCAL's ability to recover landscape in- 
formation is hampered respectively, as explained in the previous section. Since 
we would like to demonstrate the operation of FOCAL from the bottom of the 
landscape, including the climbing-up phase, we will utilize the specified test-case. 
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initDESO 

initParams (ccou, o"o, oi) 
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for fc = 1 . . . A do 
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until stopping criterion is met 

C regularize ^C'^'j {Tikhonov Filtering} 

return H 
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Numerical Results Fig. [3] presents the outcome of FOCAL operating on the 
80-dimensional noisy separable eUipse with the def-CMA-ES search-engine. The 
numerical results clearly show that FOCAL obtains the correct Hessian informa- 
tion with satisfying accuracy. This outcome is in contrast to the default CMA-ES, 
which in this case fails to learn the correct eigenspectrum. 



4 Systems under Study: Quantum Control Experiments 

We demonstrate the operation of FOCAL on various experimental systems that 
require high-dimensional continuous optimization {n = 80), and at the same 
time, can benefit from Hessian determination about their global optima for sen- 
sitivity analysis, dimensionality reduction, mechanism investigation, etc. This 
section will present the systems under investigation in the field of experimental 
Quantum Control. We will begin with a short introduction to this field, and 
especially with an overview on its practical optimization aspects. 



4.1 Optimization in Quantum Control 

The primary laboratory operation in closed learning-loops within QC systems 
is the shaping of the temporal electric field, E (t) , which completely determines 
the dynamics of any controlled quantum process as dictated by the Schrodinger 
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Fig. 3. Application of FOCAL to the separable ellipse (Eq. 
Ej; = 0.025, num/evals: 30,000). The recovered FOCAL spectrum (bars) accurately 
reproduces the analytical form (curve), in contrast to the inaccurate recovery of the 
default CMA-ES (stems). The FOCAL parameters were set to ao = 0.075, c^ov = 0.04, 
a = 0.10. 



equation. Determining E (t) thus typically constitutes the optimization task. In 
practice, most QC experiments are carried out by means of spectral modulation, 
where the control function consists of the spectral amplitude A{uj) and phase 
(j){u}) functions, which together construct the temporal electric field: 

E{t) = |y M^) exp{i(j){uj)) exp{-iujt) dwj . (31) 

Most QC processes are very sensitive to the phase, and phase-only shaping is 
typically sufficient for attaining optimal control. Our experiments include phase 
modulation only, where the spectral amplitude A{u)) is fixed. The latter func- 
tion is approximated well by a Gaussian and determines the bandwidth, or the 
minimal temporal pulse duration. Shaping the field with phase-only modulation 
guarantees conservation of the pulse energy. The spectral phase (/"(w) is defined 
at n frequencies {wj}"^]^ that are equally distributed across the bandwidth of the 
spectrum. These n values, {0(a;j)}"^j^, correspond to the n pixels of the pulse 
shaper and are the decision parameters to be optimized in the experimental 
learning loop: 

= (</)(wi),</)(u;2),...,</'(c^„)). (32) 

The phase function is subject to periodic boundary conditions within [0, 27r]". A 
practical note with regard to these boundary conditions will be given in the fol- 
lowing section. The pulse shaping process is implemented by a so-called Spatial 
Light Modulator (SLM), which is typically based on Liquid Crystal (LC) tech- 
nology. This approach thus considers n individual pixels, subject to rectangle- 
activation- functions, to construct the phase function 4>{uj). Figure |4] provides an 
illustration of an experimental QC learning loop. 



4.2 A Practical Note on Derandomization Upon Restricted 
Mutations 

In order to effectively address the periodic nature of the decision variables, a 
special treatment enforcing the boundary conditions (see [2] P- 179) can be incor- 
porated into the optimization scheme. Overall, there are three typical scenarios: 

— Allowing unbounded free search (default optimization). 

— Rejecting offspring which violate the boundary conditions [0, 27r]". 

— Employing a wrapping operator^ i.e., 4i{uj) ^ (/)(a;)mod27r, immediately after 
the mutation procedure. 

The default unbounded search typically performs well in terms of attaining a 
good QC yield, but its learning rate is slow and overall performance inefficient. 
At the same time, it obtains phase solutions that are hard to interpret, as they 
are stretched over large scales, far beyond the 27r domain. Most importantly 
in our context, the learned correlations between the decision variables are not 
physical, and it therefore cannot be employed in the FOCAL scheme. As for 
the other two procedures, the rejection concept clearly solves the boundary con- 
dition problem, while potentially hampering the progress rate when discarding 
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Fig. 4. The experimental Quantum Control learning loop. An unshaped pulse, ap- 
proximated well by a Gaussian, is shaped by a pixel-based modulator and shined on a 
molecular sample. The decision variables addressed by the evolution strategy are the 
individual pixels that dictate the shaping. The measured signal reflecting the molecular 
response, which is provided by the detector, constitutes the feedback to the evolution- 
ary optimizer. 

already evaluated candidate solutions; in contrast, the wrapping operation offers 
a simple solution which directly treats the physical issue, without losing evalu- 
ated offspring. However, upon wrapping a newly generated offspring, it is critical 
within the derandomization framework to simultaneously update the mutation 
vector which led to this candidate solution. For instance, the a posteriori mu- 
tation vector may be easily reconstructed within the CMA-ES in the following 
manner: 



We thus adopt this proposed wrapping scheme. 
4.3 Implementation 

In the following sections we describe in detail our experimental results upon de- 
ploying FOCAL to the two QC experimental systems. A commercial Ti:Sapphire 
femtosecond laser system generates amplified pulses centered at 780nm with a 
bandwidth of AX ~ 36nm, which gives pulses of duration At ~ 30fs full-width- 
half-maximum (FWHM). These pulses are delivered to a pulse shaper with a 
programmable 640 pixel SLM for phase-only modulation. The search is con- 
ducted across an n = 80 dimensional space of spectral phases (SLM pixels are 
tied together in groups of eight). 




(33) 



4.4 Rank-Deficient Problem: Atomic Rubidium 

The first experimental system to wliicli FOCAL is applied possesses a rank- 
deficient Hessian, and involves atomic Rubidium (Rb) [13]. In this process, a 
shaped pulse induces atomic transitions in Rb, while covering multiple resonant 
pathways, after which a radiative decay to the ground state produces visible 
fluorescence, which serves as a measure of the excited state population and forms 
the feedback signal. This is a simple two-photon process, with a rank-deficient 
global maximum. 

Laboratory Realization Following phase-only shaping, the laser pulses are focused 
into a 25mm vapor cell of atomic Rb maintained at 90°C. The visible fluorescence 
is collected in a direction orthogonal to the incident beam. 

4.5 Pull-Rank Problem: Second Harmonic Generation (SHG) 

Second harmonic generation (SHG) or frequency doubling is a two-photon process 
in which an electric field interacts nonlinearly with a material and generates 
an output photon with the combined energy of two input photons. The total 
energy of the output light is proportional to the integrated squared intensity of 
the primary pulse, as expected from a second-order nonlinear optical process. 

The time-dependent profile of the laser field is given in Eq. [31] The Total- 
SHG signal is then specified by: 



i.e., integration of the squared intensity. SHG is a common test-case in the 
laboratory, as it constitutes a useful indication of the pulse intensity, and its 
investigation contributes to the understanding of other processes. Moreover, its 
attractiveness also lies in its complete mathematical formulation. 

Optimization: Global Maximum In the context of global optimization, the max- 
imization problem of the Total-SHG signal is often considered as a typical cal- 
ibration task, and has been previously reported [32 . The Total-SHG signal St 
is maximized by any linear phase function of frequency, and in particular by a 
constant phase, i.e., 



Despite the simplicity of the optimal phase function, the fitness landscape has 
been shown to have a highly complex structure [44]. 

Laboratory Realization The Total-SHG signal is observed by focusing amplified 
pulses onto a 20 /im type-I BBO crystal, and the actual signal is recorded with 
a photodiode and boxcar integrator. 




(34) 



argmax^(„-) {St (0(a;))} = a • w + b Va,b. 



(35) 



5 Practical Observation 



5.1 Atomic Rubidium 

The forced step CTq is set to 7.5% of [0, 27r], a = 0.25, and a (10,20) strategy is 
employed. Figure [5]ja) presents the fitness curves of the three FOCAL routines 
when appHed to the optimization of the atomic Rb system. The learning curves 
of the def-CMA and sep-CMA practically merge and rise above that of the iso- 
CMA following ~ 1000 experimental evaluations. Both search-engines continue 
to improve their fitness values and upon convergence 7500 experiments), fit- 
ness variations induced by either experimental noise or poor mutations diminish. 
Conversely, the iso-CMA optimization does not realize significant improvement 
after ^ 1000 experiments, and only achieves ^ 85% of the fitness obtained by 
either the def-CMA or the sep-CMA. Prominent signal fluctuations are visible 
and the iso-CMA also suffers from large fitness value drops. The substandard 
and unstable performance of iso-CMA-driven FOCAL indicates an inability to 
generate mutations solely in search directions that are yield- insensitive (i.e., the 
search null-space) when subject to the current step-size mechanism. 
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Fig. 5. Experimental FOCAL conducted on the atomic Rubidium system. Panel (a) 
displays the optimization yield curves (normalized to the transform-limited pulse) for 
the def-CMA-FOCAL, sep-CMA-FOCAL, and the iso-CMA-FOCAL. While the curves 
of the def-CMA and sep-CMA practically merge, the iso-CMA routine fails to remain 
at the top of the landscape upon forcing a practical step within the FOCAL scheme. 
Panel (b) displays post-facto calculations concerning the behavior of the practical step- 
size, namely, 5p (Eg. |21[ ), ||Zix|| (empirical, as the distance between consecutive search 
points), as well as the upper and lower bounds (Eq. 27 1. The calculations correspond 
to the def-CMA-FOCAL run shown in panel (a). 



In order to supplement the evidence that FOCAL learns the optimal sam- 
pling distribution, it is insightful to additionally examine the post-facto practical 
step-sizes (shown in Figure [sjb)) taken across the fitness landscape for the def- 
CMA-driven FOCAL. The calculations involve (Eq. [211), ||44a;|| (empirical. 



as the distance between consecutive search points), as weh as the largest and 
smallest practical step-sizes (also referred to as the upper and lower bounds, re- 
spectively, corresponding to Eg. [27]). After ^ 1000 experimental evaluations, the 
upper and lower bounds begin to meaningfully diverge. The practical step-size 
continues to lie near the upper bound for the remainder of the optimization, and 
the proximity of the experimental steps to this upper bound indicates that the 
generated mutations lie predominantly in a space spanned by yield-insensitive 
search directions, which explains the minimal yield deviations seen in Figure 
[sja). In essence, this observation constitutes the ultimate demonstration of the 
principle FOCAL rationale. 

The resultant Hessian recovered from the experiment is displayed in Figure 
[6][a). The Hessian illuminates individual couplings within Rb, and its eigende- 
composition reveals the existence of only six important search directions (matrix 
rank), which explains the noise robustness observed in Figure [5ja) as the ma- 
jority of mutations lie within a null-space for the quantum system. In order to 
demonstrate the reliability of the FOCAL-identified Hessian, the eigenvectors 
corresponding to the five most yield-sensitive search directions (largest Hes- 
sian eigenvalues) are shown in Figure |6j[b). As seen in the figure, the recovered 
eigenvectors display prominent spectral structure at the known four resonant 
wavelengths of Rb in addition to the two-photon wavelength 778nm). Yet, 
each eigenvector does not simply correspond to a single resonant wavelength, 
but rather is a complicated superposition of features across the entire spectral 
region. These eigenvectors comprise a reduced-dimensional, optimal control ba- 
sis with which future optimizations or landscape studies may be performed. The 
eigenspectrum profile has also been successfully corroborated in two different 
manners: Firstly, by conducting a naive statistical Hessian measurement at the 
top of the landscape, involving «25,000 evaluations, and secondly, by applying 
PCA to the set of recorded search points at the final stage of the evolutionary 
search. 



5.2 Total-SHG 

The forced step ctq is set to 10% of [0, 2tt], a = 0.1, and a (15, 30) strategy is em- 
ployed. Figure [7](a) presents the fitness curves of def-CMA and sep-CMA driven 
FOCAL routines when applied to the optimization of the Total-SHG system. The 
two curves practically merge, as was also observed in the Rb optimization case. 
We omit details regarding the iso-CMA-FOCAL, as its behavior was inferior, 
in an equivalent manner to the reported observation on the Rb system. Fur- 
thermore, the calculated post-facto practical steps are depicted in Figure [7](b). 
Unlike the rank-deficient Rb problem, where the practical-steps approached their 
upper limit (see Figure |5jb)), the full-rank nature of the SHG problem forces 
the practical-steps to approach their lower limit, as clearly shown. This is an 
expected result, since the lack of a null space requires a more careful generation 
of mutations with a lower practical step. 

The resultant Hessian recovered from this experiment yielded a spectrum 
which accurately matches the analytical spectrum derived from theory (see Ap- 
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Fig. 6. Experimental Hessian for atomic Rubidium obtained by FOCAL. Panel (a) 
displays the Hessian matrix retrieved from inverting the FOCAL-optimized Covari- 
ance matrix. Panel (b) displays the eigenvectors corresponding to the five most yield- 
sensitive search directions. These eigenvectors exhibit prominent structure surrounding 
the four resonant wavelengths of rubidium, which are indicated with the colored bars. 




Fig. 7. Experimental FOCAL conducted on total-SHG. Panel (a) displays the op- 
timization yield curves (normalized to the transform-limited pulse) for the default 
CMA-ES (def-CMA) and the sep-CMA-ES routines, when a forced step tro is imposed. 
This plot confirms the successful convergence of both strategies. Panel (b) displays 
post- facto calculations concerning the behavior of the practical step-size, namely, 5p 
(Eg. \21\ , (empirical, as the distance between consecutive search points), as well 

as the upper and lower bounds (Eg. |27[ |. The calculations correspond to the def-CMA 
run shown in panel (a). Unlike the rank-deficient Rb problem, where the practical steps 
approach their upper bound (see Figure[5|b)), the full-rank nature of the SHG problem 
dictates a proclivity toward the lower bound, as clearly shown here. 



pendix), as shown in Figure |8] This perfect match constitutes another corrobora- 
tion of FOCAL's abihty to successfully operate in this high-dimensional regime 
and to extract complex information from experimental data. In Figure [8] we 
again see the failure of the default CMA-ES to uncover the correct spectrum. 
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Fig. 8. Experimental SHG Hessian spectrum recovered by FOCAL (bars), which 
matches the theoretical spectrum (curve) (for its derivation see the Appendix; note 
that the current scaling depicts the negative of the Hessian eigenvalues, as provided 
in Figure 10 1 . The default CMA-ES (stems) fails to identify the correct eigenspec- 
trum. This problem is known to be non-separable and possesses a highly-complex, 
non-quadratic landscape [44i. 



5.3 Learning Rate: The Condition Number 

An important measure of algorithmic behavior is the learning rate of the target 
Hessian matrix. This measure can be directly assessed by considering the con- 
dition number of the evolving covariance matrix (Eq. |28[ ) as a function of the 
number of experimental measurements (function evaluations). Figure [9] presents 
the experimental data with regard to the learning rate. The results show a linear 
increase of 

y^cond (C(5)) 



as a function of the number of experimental trials, for both the atomic Rb and 
Total-SHG systems. Overall, this observation reveals a very attractive expo- 
nential learning rate of the target matrix. As a comparison, moment-based 
methods typically possess a rate proportional to the square-root of the number 
of experimental trials. The complexity of the learning processes is reflected by 
the slopes of these curves: The rank-deficiency of the Rb system allows for a 
more rapid learning process in comparison to the Total-SHG problem with its 
full-rank Hessian. 




Fig. 9. The FOCAL learning rates, presented by the condition number as a func- 
tion of the number of experimental trials. Linear fits are obtained for jC{g) = 
logj^Q cond (C(£'))j , revealing exponential learning rates for def-CMA-FOCAL op- 
erating on the two experimental QC systems: [Left] Rb and [Right] Total-SHG. 



5.4 Spectrum Evolution 

Investigation of the evolving spectra during FOCAL runs on the two systems re- 
veals interesting information, regarding both the search landscape as well as the 
FOCAL machinery. The evolving experimental spectra were captured in movies, 
which are presented as online supplementary material |45]. One observation is 
the rapid evolution of the two spectra as a function of the search stage, which 
confirms their locality. In contrast to claims that the evolving covariance ma- 
trix within a self-adaptive ES is carrying non-local information, we observe that 
the resultant spectrum within the FOCAL mechanism is locally formed only at 
the top of the landscape, i.e., it is not significantly contaminated by non-local 
information left over by the evolution path, which constitutes a sliding average. 
Another observation concerns the convergence of the spectra. In the atomic Rb 
system, the learning procedure exploits the rank-deficiency and first identifies 
the null space. Equivalently, in the Total-SHG case the procedure first identifies 
the directions with the smallest magnitude Hessian eigenvalues, i.e., it exploits 
a quasi-null space for the construction of the spectrum. In the final convergence 



phase at the global maximum, it can be clearly seen for both systems that FO- 
CAL is picking up the directions with the largest magnitude Hessian eigenvalues, 
and gradually completes the spectra down to those with the lowest magnitude, 
in contrast to a possible scenario of simultaneous directional learning. This is 
an expected result, which can be considered as a learning process driven by a 
greedy selection pressure striving for the least yield declines. 

5.5 Parameter Settings 

In order to systematically characterize the role and the optimal values of the 
FOCAL parameters Ccov, a, and ao, we empirically studied FOCAL's behavior 
upon launching a series of simulations on model landscapes of varying ranks and 
dimensions. Table [T] presents the recommended parameter values for different 
problem scenarios, and below are some conclusions: 

— FOCAL's behavior is not highly sensitive to the CTq value. We recommend 
setting it to 5%-10% of the coordinate interval. 

— < a < 1/2 is mostly rank-dependent; the higher the rank, the lower its 
value should be set. This is expected, as it controls the spectral pressure on 
FOCAL's machinery. 

— 1/n < Ccov < 1 is both dimension- and rank-dependent. As the dimension 
and the rank increase, its value should be decreased. 



Table 1. Recommended FOCAL parameter settings, for Ccov and a, on a representative 
set of cases. 



Dimension 


Rank-Deficient 


Full-Rank 


Ccov 


a 


Ccov 


a 


n = 30 


0.10 


0.25 


0.08 


0.19 


n = 50 


0.08 


0.22 


0.06 


0.15 


n = 80 


0.07 


0.20 


0.04 


0.10 



Note that the table is practically symmetric with respect to the role played by 
either the rank or the dimension: high-dimensional problems with rank-deficiency 
are empirically equivalent to low-dimensional full-rank problems. 

6 Discussion 

We have introduced a new technique, entitled FOCAL, for extraction of second- 
order landscape information at the global basin of attraction, which is based 
upon the CMA-ES search-engine. We derived fundamental principles for the in- 
verse relation between the statistically learned ES covariance matrix and the 



Hessian of the landscape. An algorithm was formulated, which forces the ES- 
kernel to explore the top of the landscape toward the end of an optimization 
routine efficiently determining the Hessian. We described the method in detail, 
while deriving bounds for the characteristic adaptive strategy parameters, which 
allow for better mechanism understanding. In summary, we claim that the in- 
ability of conventional ES to learn the correct covariance matrix under certain 
conditions most likely lies in the global step-size behavior. Consequently, the 
primary contribution of FOCAL is the introduction of an alternative step-size 
mechanism specifically targeting Hessian determination. Within this scheme, the 
spectrum-dependency of FOCAL's step-size update seems to play a crucial role 
in the method's capability to learn the nature of the top of the landscape, es- 
pecially when operating under conditions of large condition numbers > 10"^). 
This method also introduced several user-defined parameters along with pro- 
posed rules of thumb for scitting those parameters. Upon utilizing the proposed 
values, FOCAL was demonstrated to perform extremely well over a broad ex- 
perimental platform. Further study of the optimal FOCAL parameter settings 
may constitute an important direction for future research. 

Following a demonstration of a practical scenario in which the default CMA- 
ES does not learn the correct Hessian spectrum of a basic model landscape, 
the newly proposed FOCAL method was shown to easily obtain the known 
Hessian. It was then applied to the experimental platform of QC systems. We 
reported on the laboratory success of FOCAL to extract Hessian forms of several 
systems. Those experimental results reliably matched the Hessian forms expected 
from quantum mechanical theory, and therefore constituted an experimental 
corroboration for the high fidelity of the FOCAL procedure. The learning rate 
of FOCAL was observed to be exponential in terms of algorithmic iterations (or 
function evaluations), introducing an additional attractive feature of this novel 
technique. 

This study oH'ers an additional important observation with regard to the na- 
ture of the landscapes of the investigated QC systems. Simple intuition might 
suggest that the ease of optimizing QC systems is due to their possessing spher- 
ical landscapes. This study clearly refutes this behavior and explicitly demon- 
strates the non-spherical nature of the investigated QC landscapes. 

There arc two primary directions for future research. One is the investigation 
of possible spectrum compression by means of a first-order DES (a strategy em- 
ploying a diagonalized covariance matrix, e.g., sep-CMA-ES). Since the deploy- 
ment of such a strategy was demonstrated to be successful within the FOCAL 
framework in terms of convergence, it would be interesting to rigorously study 
the nature of the attained compressed spectra, and ideally draw an analytical 
link to the actual landscape information. Another direction would be the char- 
acterization of the problem conditions under which conventional ES succeed or 
fail to learn the correct covariance matrix reflective of the search landscape. 
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Appendix: Derivation of the SHG Hessian 



The Hessian for Second Harmonic Generation (SHG) is given by 



S(l){uj') S(t){uj") 



(36) 



</>*(cj)=0 



where 4>{oj) is the spectral phase applied to the pulse shaper. The SHG signal 
"^SHG is provided by 



JSHG= / \E2{UJ)\^ diO, (37) 
J —oo 

where £'2(0;) — E{uj')E{uj — ijj')duj' and the complex spectral field E{uj) = 
A{uj) exp[i0(iLij)] is given in terms of the laser spectral amplitude A{uj) and applied 
phase </'(w). The spectral amplitude A(uj) is taken as real, positive, and related 
to the experimentally measured spectral intensity |i<^(aj)p through |£'(w)p = 
A{uj)'^. The second-order functional derivative of Jshg may then be written as 



(38) 

Variational analysis is utilized to derive the relevant functional derivatives of the 
complex spectral field: 



'^=2^EiJ)Ei.-.') 



S^E^jtu) 
S(j}{uj') 6<j){uj") 



-2E{uj')E{uj - uj') [6{oj' - uj") + 6{uj - J - uj")] 



Substitution of these two relations into Eq. 38 and evaluation with the optimal 
phase {(j)* (w) = 0) finally leads to a SHG Hessian of the form 



li{uj',uj") = - 4 A{uj')S{uj' - uj") J J dujdzA{z)A{uj - uj')A{uj - z) 

— OO 
OO 

- A A{uj')A{uj") [ dujA{uj)A{uj' +to" -oj) 



-OO 

OO 



+ 8 A{uj')A{lo") / dujA{uj-uj')A{uo-Lo"). (39) 



The Hessian may be cast into a more tractable expression by considering an ex- 
perimentally relevant spectral amplitude of the form A{uj) — exp[— 2 In 2{u'^ / A^^^y^) 
where Z\fwhm is the spectral intensity FWHM. Following this assumption, the 



integrals of Eq. 39 may be analytically evaluated, and the Hessian becomes 



H(a;',..") =-4 ^1^^ A(..') exph^ (c^'V^Iwhm)] - J') 



- 4 Z\FWHMy A{uj')A{uj") exp[-(w' - uj" f / A^^^^y^] 
+ 8 ^FWHM^^ A{u:')A{lo") exp[-(c^' + c^")V^Iwhm]- (40) 
This Hessian is evaluated over the frequency domain, 

w e [—1.5 • ZipwHM, 1-5 • Z\fwhm] , 

which corresponds to the experimental width of the pulse shaper pixel array 
(the optics of the pulse shaper are designed so that the pixel array width is 



3 • Z\fwhm)- The resulting Hessian is depicted in Figure 10 It displays both 
positive (diagonal) and negative (off-diagonal) correlations amongst frequency 
components. 
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